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Abstract. In this paper, we address the problem of dimension re- 
duction for sequentially observed functional data (X k : k G Z). Such 
functional time series arise frequently, e.g., when a continuous time pro- 
cess is segmented into some smaller natural units, such as days. Then 
each Xk represents one intraday curve. We argue that functional princi- 
pal component analysis (FPCA), though a key technique in the field and 
a benchmark for any competitor, does not provide an adequate dimen- 
sion reduction in a time series setting. FPCA is a static procedure which 
ignores valuable information in the serial dependence of the functional 
data. Therefore, inspired by Brillinger's theory of dynamic principal 
components, we propose a dynamic version of FPCA which is based on 
a frequency domain approach. By means of a simulation study and an 
empirical illustration, we show the considerable improvement our method 
entails when compared to the usual (static) procedure. While the main 
part of the article outlines the ideas and the implementation of dynamic 
FPCA for functional Xk, we provide in the appendices a rigorous theory 
for general Hilbertian data. 

1. Introduction 

The tremendous technical improvements in data collection and storage allow to 
get an increasingly complete picture of many common phenomena. In principle, all 
processes in real life are continuous in time and, with improved data acquisition tech- 
niques, they can be recorded at arbitrarily high frequency. To benefit from increas- 
ing information, we need appropriate statistical tools that can help us extracting 
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the most important characteristics of some possibly high-dimensional specifications. 
Functional data analysis (FDA) has proven in recent years to be an appropriate 
tool in many such cases and has consequently evolved into a very important field of 
research in the statistical community 

Most classically, functional data are considered as realizations of (smooth) ran- 
dom curves. Then every observation X is a curve (X(u): u G IX). One generally 
assumes, for simplicity, that U = [0, 1], but U could be a more complex domain like 
a cube or the surface of a sphere. Since observations are functions, we are dealing 
with high-dimensional, in fact intrinsically infinite-dimensional objects. So, not sur- 
prisingly, there is a clear demand for efficient data reduction techniques. As such, 
functional principal component analysis (FPCA) has taken a leading role in FDA. 
Arguably, it can be seen as the key technique in the field. In analogy to classical 
multivariate PCA (see Jolliffe [21J), functional PCA heavily relies on an eigendecom- 
position of the underlying covariance function. The mathematical foundations for 
this have been laid several decades ago in the pioneering papers by Karhunen [22J 
and Loeve [22], but it took a while until the method was popularized in the statisti- 
cal community. Some earlier contributions are Besse and Ramsay jl], Ramsay and 
Dalzell [28j and, later, the influential books by Ramsay and Silverman [29J , [30J and 
Ferraty and Vieu |10| . Statisticians have been working on problems related to esti- 
mation and inference (Kneip and Utikal [23], Benko et al. [3]), asymptotics (Dauxois 
et al. [H] and Hall and Hosseini-Nasab [14J), smoothing techniques (Silverman [32J), 
sparse data (James et al. [20], Hall et al. [15J), and robustness issues (Locantore et 
al. [21], Gervini [11]), to name just a few. Important applications include FPC-based 
estimation of functional linear models (Cardot et al. [8J, Reiss and Ogden [31J ) or 
forecasting (Hyndman and Ullah [19J, Aue et al. jT|). The usefulness of functional 
PCA has also been recognized in other scientific disciplines, like chemical engineering 
(Gokulakrishnan et al. [13]) or functional magnetic resonance imaging (Aston and 
Kirch [2], Viviani et al. [31]). Many more references can be found in the above cited 
papers and in Sections 8-10 of Ramsay and Silverman [30j, where we refer to for 
background reading. A further reason for the success of FPCA seems to be the fact 
that, in contrast to their multivariate counterpart, FPCs do not suffer from the lack 
of scale invariance. Roughly speaking, while in the vector case different components 
can have completely different measuring units, all points X(u), u G [0, 1], of some 
curve are expressed in the same units, and rescaling at different u values is usually 
not meaningful. 

Most existing concepts and methods in FDA, even though they may tolerate 
serial dependence, have been developed for independent observations. This is a 
serious weakness, as in numerous applications the functional data under study are 
obviously dependent, either in time or in space. Examples include daily curves of 
financial transactions, daily patterns of geophysical and environmental data, annual 
temperatures measured on the surface of the earth, etc. In such cases, we should 
view the data as the realization of a functional time series (X t (u): t G Z), where 
the time parameter t is discrete and the parameter u is continuous. For example, 
in case of daily observations, the curve X t (u) may be viewed as the observation on 
day t with intraday time parameter u. A key reference on functional time series 
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techniques is Bosq [7], who studied functional versions of AR processes. We also 
refer to Hormann and Kokoszka [18] for a survey. 

Ignoring time dependence in this time series context may result in misleading, or 
even completely wrong, findings, and highly inefficient procedures. Similar conclu- 
sions motivated Hormann and Kokoszka |17j to investigate the robustness properties 
of some classical FDA methods in the presence of serial dependence. In particular, 
they show that usual FPCs still can be consistently estimated within a quite general 
dependence framework. Yet, the basic problem remains that FPCA operates in a 
static way: when applied to serially dependent curves, it fails to take into account 
the potentially very valuable information carried by the past values of the functional 
observations under study. In particular, a static FPC with small eigenvalue, hence 
negligible instantaneous impact on X t , may have a major impact on X t +i, and high 
predictive value. Neglecting it, as FPCA does, may have serious consequences. 

Besides their failure to produce adequate dimension reduction, static FPCs, while 
cross-sectionally uncorrelated at fixed time t, typically still exhibit lagged cross- 
correlations. Therefore, unlike in the i.i.d. case, the resulting FPC scores cannot be 
analyzed componentwise, but need to be considered as vector time series, which are 
less easy to handle and interpret. 

These shortcomings motivated our development of dynamic functional principal 
components. The idea is to transform the functional time series into a vector time se- 
ries (of low dimension 3 or 4, say), where the individual component processes are mu- 
tually uncorrelated, and account for most of the dynamics and variability of the origi- 
nal process. The analysis of the functional time series can then be performed on those 
dynamic principal components. Since the transformed variables are non-correlated, 
we can even perform any second-order based analysis componentwise. In analogy to 
the static FPCA, the curves can be optimally reconstructed/approximated from the 
low dimensional time series via a dynamic version of the celebrated Karhunen-Loeve 
expansion. 

Dynamic PCs first have been suggested by Brillinger |5j for vector time series. 
The purpose of this article is to extend the Brillinger approach to a functional, or 
more general Hilbert space setting. The methodology heavily relies on a frequency 
domain analysis for functional data, which has been only recently brought forth by 
Panaretos and Tavakoli |26| . 

An impression of how well the proposed method works can be obtained from 
Figure [TJ Its left panel shows ten consecutive intraday curves of some pollutant 
level. (A detailed description of the underlying data is given in Section [5]) The 
two panels to the right show the reconstructions of these curves after performing 
of a dimension reduction to dimension one. We used static FPCA in the central 
panel and dynamic FPCA in the right panel. The difference is striking. While the 
static method solely reproduces an average level and exhibits a spurious intraday 
symmetry, the dynamic counterpart to a large extent catches the evolution of the 
curves. In particular, it retrieves remarkably well the intraday trend of the pollution 
levels. 

The rest of the paper is organized as follows. In Section |2j we describe our 
approach and state a number of relevant propositions. In Section [3j we discuss its 



3 




0.0 0.2 0.4 0.6 0.8 1.0 
Intraday time 



0.0 0.2 0.4 0.6 0.8 1.0 
Intraday time 



0.0 0.2 0.4 0.6 0.8 1.0 
Intraday time 



Figure 1: Ten subsequent observations (left panel), the corresponding static 
Karhunen-Loeve expansion with one component (middle panel) and the 
dynamic Karhunen-Loeve expansion with one component (right panel). 



practical implementation and the related numerical costs. After a simulation study 
in Section |4j we illustrate the methodology by a real data example on pollution 
curves. Appendix A contains a rigorous mathematical framework and the proofs 
in a general Hilbertian setting. Finally, in Appendix B, we justify the proposed 
estimation steps by providing some asymptotics. 



2. Methodology for L 2 curves 

In this section, we introduce some necessary notation and tools. Most of the discus- 
sion on technical details is postponed to Appendices A and B. While focusing here 
on L 2 ([0, l])-valued processes, i.e. square integrable functions defined on the unit 
interval, we will work, in the technical appendices, with processes taking values in 
some arbitrary separable Hilbert space. Such general setup facilitates notation and 
makes theory clearer, but we postpone it until Appendix A, to make the paper more 
easily accessible for readers less familiar with functional analysis. 



2.1. Notation and setup 

Throughout this section, we consider a functional time series (X t : t £ Z), where 
X t takes values in the space H := L 2 ([0,1]) of complex-valued square-integrable 
functions on [0, 1]. This means that X t = (X t (u) : u £ [0, 1]) and 

l 

\X t (u)\ 2 du < oo, 

where \z\ := y/z^, with z the complex conjugate of z, denotes the modulus of z £ C. 
In most applications, observations are real, but, since we will use spectral methods, 
a complex vector space definition will serve useful. 
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The space H then is a Hilbert space, equipped with the inner product (x,y) : = 
f x(t)y(t)dt, so that ||x|| = \J (x, x) defines a norm. The notation X G L P H is used 
to indicate that, for some p > 0, < oo. Any X G L X H then possesses a mean 

curve \i = (E[X(t)} : t G [0, 1]), and any X G a covariance operator C, defined 
by C(x) := E[(X — [/,, x)(X — //)]. The operator C is a kernel operator given by 

C(x)(u) — I c(u,v)x(v)dv, with c(u, v) :— cov(X(u), X(v)), u,v g[0,1]. 

The process (A^ : t G Z) is called weakly stationary if for all t we have (i) Xf, G 
(ii) EX t = EX and (iii) for all h G Z and u,v E [0, 1] 

cov(X i+/l (u),X 4 (w)) = cov(X fc («), X (v)) =: c h (u,v). 

Denote by C^/i 6 Z, the operator corresponding to the autocovariance kernels c^. 
Clearly, Co = C. For our problem, the mean is not important, so we will throughout 
suppose that random elements are centered. For the rest of the paper, it will be tacitly 
imposed that (X t : t G Z) is a weakly stationary, zero mean process defined on some 
probability space (ft, A, P). 

As in the multivariate case, the covariance operator C of a random element X G 
L 2 H admits an eigendecomposition (see, e.g., p. 178, Theorem 5.1 in [12J) 

oo 

1=1 

where (A^: I > 1) are C"s eigenvalues (in descending order) and [vf. I > 1) the 
corresponding normalized eigenfunctions, so that C(vi) = X(V£ and ||i^|| = 1. If C 
has full rank, then the sequence (t^: £ > 1) forms an orthonormal basis (ONB) of 
L 2 ([0, 1]). Hence X admits the representation 

oo 

X = ^2(X,v t )vt, (1) 

1=1 

which is called the Karhunen-Loeve (KL) expansion of X. The eigenfunctions vt 
are called the (static) functional principal components (FPCs) and the coefficients 
(X, V() are called the (static) FPC scores or loadings. It is well known that the basis 
(ve: £ > 1) is optimal in representing X in the following sense: if (we: £ > 1) is any 
other ONB of H, then 

v v 
E\\X - J2(X, v t )vt\\* < E\\X — J2( x , m)w e \\ 2 , Vp > 1. (2) 
t=i t=i 

Property (|2| shows that a finite number of FPCs can be used to transform the 
function X to a vector of given dimension p with a minimum loss of "instantaneous" 
information. It should be noted, though, that this transformation is static in its 
nature, meaning that it is performed observation by observation, and does not take 
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into account the possible serial dependence of the XfS, which is likely to exist in a 
time series context. Globally speaking, we should be looking for a transformation 
which involves all observations, and is based on the whole family (Ch'- h G Z) 
rather than on Cq only To achieve this goal, we introduce below the spectral density 
operator, which contains the full information on the family of operators (Ch '■ h G Z). 



2.2. The Spectral Density Operator 

Existence of the operator to be defined below requires a summability condition on 
the autocovariance operators Ch- Specifically, we assume that 

( / / \ch(u, v )\ 2 dudv j < oo, (3) 

h&Z ^° J ° ' 

a condition that is more conveniently expressed as 

\\Ch\\s < oo, (4) 



he2 



where || ■ ||s denotes the Hilbert-Schmidt norm (see Section A.l ). A simple sufficient 



condition for Q will be provided in Proposition [7j Now, set 

f e x (u,v) := i-^ c ,( M ,t;)e- iM , 9 G 

hez 

where i denotes the imaginary unit. By ([3]), this series converges in mean square for 
all 9. 

Definition 1. Let (X t ) be a stationary process. The operator whose kernel is 
■) is called the spectral density operator of (X t ) at frequency 6. 

This concept of a spectral density operator has been very recently introduced 
by Panaretos and Tavakoli where we refer to for many interesting details on 
estimation and asymptotics. In our context, this operator is used to create particular 



functional filters (see Sections 2.3 and A. 3) which are the building blocks for the 



construction of dynamic FPCs. A functional filter is defined via a sequence $ = 
($£: k G Z) of linear operators between two spaces H and H' . The filtered variables 
Y t have the form Y t = YlkeZ^t(Xt-i)- For the important case when H' = M. p , the 
following proposition relates the spectral density operator of (Xt) to the spectral 
density matrix of such a filtered sequence (Y t ). This simple result plays a crucial 
role in our construction. Let \\^/\\ := sup^n^ ||^(a;)|| denote the operator norm of 
some operator \P. 

Proposition 1. Assume that <£v(x) = ((x, 4>u), (x, (/>2i), . . . ,(x, 4> p e))' , with (p m £ G H 
and Yleez ll^ll < 00 ■ Then 5^ez^C<^t-*) converges in mean square to a limit Y t . 
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The p- dimensional vector process is stationary and has a spectral density matrix J-J 
given by 



J n 



where <p* m (9) := Etez^^- 

To explain the important consequence of this result, first observe that, for every 
frequency 9, the operator is a non-negative, self-adjoint Hilbert- Schmidt oper- 
ator. Hence, assuming for the moment that the kernel fg(u,v) is continuous in u 
and v, we obtain, by Mercer's theorem (see, e.g., p. 197, Theorem 3.1 in [12j), 

fo(u,v) = Y J X m{0)^ m {u\e) Vm {v\6). (5) 



m>l 



Here, ip m {u\ff) (in short, tp m {6)) and X m (9) are the eigenfunctions and eigenvalues, 
respectively, of Tf. The series (|5]) converges absolutely and uniformly on [0, l] 2 . We 
impose the order \\[&) > Aa(0) > • • • > for all 9 G [— 7r,7r], and require that the 
eigenfunctions be standardized, so that ||<£J m (0)|| = 1 for all m > 1 and 9 G [— 7r, 7r]. 
Then the sequences (ip m (9) : m > 1) form orthonormal bases of the closure Im(J-^-) 
of the image of Tf. If Tf is not full- rank, we can always extend (ip m (9) : m > 1) 
into a basis of H, and thus, without loss of generality, we assume that the closed 
span sp({p m (9) : m > 1) is H. 

Assume now that we could choose the functional filters (0 m ^: f 6 Z) such that 
4"m(^) = fm{9)- We then have Fj = diag(A i (6 l ), . . . , A m (#)), implying that the 
coordinate processes of (Y t ) are uncorrelated at any lag: cov(Y mt , Y m / S ) = for all 
s,t if m m! . As discussed in the Introduction, this is a highly desirable property 
which the static FPCs do not possess. 

2.3. Dynamic FPCs 

Motivated by the discussion above, we wish to define 4> m e in such a way that 

which is the case if the m £'s are the coefficients of the Fourier expansion of ip m (9) 
as a function in 9. Since m £ = <p m e(u) is a curve, the concept of Fourier expansion 
requires some explanation here. By Fubini's theorem, 

/ </ m (u\9)dud9 = / / ip 2 m (u\9)d9du, 

7T JO JO J -TV 

showing that, for all m and almost all u G [0, 1], (p m (u\9) is square integrable with 
respect to 9 G [— 7r, 7r]. Thus, for almost all u, the Fourier expansion of ip m (u\9) as 
a function in 9 exists and takes the form 

<p m {u\9) = J2^T Vm(u\s)e- Ms dse m =: Y,^{u)e m . (6) 
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This leads to the following definition. 

Definition 2 (Dynamic functional principal components). Assume that (X t : t G 
Z) is a mean zero stationary process with values in L? H satisfying assumption Q. 
Let (j) m £ be defined as in Then the m-th dynamic functional principal component 
(DFPC) score of X t is 

Y mt := J2( X t-t' ^rrd), t G Z, TO > 1. (7) 

We call $ m := ((p m £- i £ Z) £/ie m-th DFPC filter coefficients. 

The rest of this section is devoted to some important properties of dynamic FPCs. 

Proposition 2 (Elementary properties). Assume that (X t : t G Z) is a real-valued 
stationary process satisfying Q and let Y mt be its dynamic FPC scores. Then, 

(a) the eigenf unctions ip m {0) are Hermitian, and hence Y mt is real; 

(b) if Ch = for h 0, the dynamic FPC scores coincide with the static ones. 

Our construction of <j) m £ was motivated through Proposition [T] In order to apply it 
to the thus defined functional filters, we shall now impose for some of the subsequent 
results that 

\\<f>mi\\ < oo. (8) 



Proposition 3 (Second-order properties). Assume (X t : t G Z) is a stationary pro- 
cess satisfying Q and let Y mt be its dynamic FPC scores. Then, 

(a) the series defining Y mt is mean-square convergent, with 



EY mt = and EY^ t = ^ ^(Ce-k((prne),^mk)- 



Assume that, in addition to the previous assumptions, (|8j) holds. Then, 

(b) for m ^ m' , the dynamic FPC scores Y mt and Y m i s are uncorrelated for all s, t; 

(c) the long-run variance of the m-th dynamic FPC score sequence is 

lim - Var(y ml + • • • + Y mn ) = 2n\ m (0). 



It is important to note that Part (a) of Proposition [3] holds without assumption ([8]). 
Thus, the definition of ^ is always meaningful. However, for the derivation of more 
general properties of dynamic FPC scores, condition Q seems to be a minimal 
requirement. 

The next proposition tells us how we can recover the original process (X t (u) : t G 
Z, u G [0, 1]) from (Y mt : t G Z, to > 1). It is the dynamic analogue of the static 
Karhunen-Loeve expansion Q associated with static principal components. 
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Proposition 4 (Inversion formula). Let Y mt be the DFPC scores related to the 
process (X t (u) : t E Z, u E [0, 1]). Assume that pi) holds. Then, 



X t (u) = ^X mt (u) with X mt (u) := ^2Y mjt+e (j) me (u) (9) 



m>l 



(where convergence is in mean square). We call ^ the dynamic Karhunen-Loeve 
expansion of X t . 

The random variables denned by Ylm=i X m t{u), P > 1 ; can be seen as p-dimensional 
reconstructions of X t (u), which only involve the p time series (Y mt : t E Z), 1 < m < 
p. Competitors to this reconstruction are obtained by replacing (p m £ in ([7]) and Q 
with other elements ip m i and v m £. The next theorem shows that within this class of 
p-dimensional processes, X)m=i approximates X t (u) in an optimal way. 

Proposition 5 (Optimality). Let Y mt be the DFPC scores related to the process 
(X t : t E Z) and let X rnt be defined as in Proposition ^ and assume that (|8j) holds. 
Let X m f X^^gz 

t+ev mi , with Y mt = J2eez(Xt-e,ip m e) , where {ip m k- k E Z) 
and (v m f.: k E Z) are elements of H, such that we have Y^keZ HV'm&H < 00 ar *>d 
Y^kez \\ v mk\\ < oo. Then, 

P PTT p 

E\\X t -J2x m t\\ 2 = J2l ^ m {0)de<E\\X t -Y,X m t\\ 2 Vp>l. (10) 

m=l m>p — 71 m=l 

Inequality (10) can be interpreted as the dynamic version of ([2]). Proposition [5] 
also suggests the proportion of variance explained by the first p dynamic FPCs as a 
natural measure of how well a functional time series can be represented in dimension 
p. This proportion is given by 



[ V \n(6)<w/E\\X 1 \\ 2 . (11) 



3. Practical Implementation 

In order to handle observed functional series in practice, some preprocessing of 
the data material is required. The approach which is commonly taken consists in 
representing a curve x(u) which is observed on grid points < u% < U2 < • • • < u r < 
1 as a functional observation with a finite number of basis functions (v^ : 1 < k < d), 
i.e. as x(u) = Ylt=i x k v k(u)- Commonly, Fourier bases, 6-splines or wavelets are 
used. A good choice of the basis and the number of basis functions will heavily rely 
on the underlying data. The coefficients Xk can be obtained, for example, via least- 
squares fitting or some penalized form thereof. We will not go into details here, but 
refer, e.g., to Ramsey and Silverman Chapters 3-5]. Once such a representation 
is established, the analysis is reduced to that of the space = sp(v & : 1 < k < d) 
spanned by the d basis functions. 

In the sequel, we write (o^- : 1 < i, j < d) for a d x d matrix with entry a^- in row 
i and column j. 
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3.1. Representation in finite dimension 

Let x G Ha, i.e. of the form x = v'x where v = (v±, . . . , Vd)' and x = (x\, . . . , Xd)' ■ 
We assume that the basis functions are linearly independent, but they need not 
be orthogonal. Any statement about x then can be expressed as an equivalent 
statement about x. In particular, if A : Hd — > Hd is a linear operator, then, for 

x e H d , 

d d d 

A ( x ) = ^2 X kM V k) = ^ x k(A(v k ),V k )v k , = v'2lx, 
k=l k=l k'=l 

where 21' = ((A(vi), Vj) : 1 < i, j < d). We call 21 the corresponding matrix of A and 
x the corresponding vector of x. 

The following simple results are stated without proof. 

Lemma 1. Let A, B be linear operators on Hd and let 21 and 03 be their correspond- 
ing matrices. Then, 

(i) for any a, G C, the corresponding matrix of aA + j3B is a2l + /3Q3; 

(ii) A(e) = Xe iffOLe = Xe, where e = v'e; 

p p 

(Hi) letting A := ^ ^ g^Vi <8> Vj, G := (g^: 1 < i,j < d), where G C, and 
i i j i 

V := ((vi,Vj) : 1 < i, j < d), the corresponding matrix of A is 21 = GV . 

To obtain the corresponding matrix of the spectal density operators J 7 ^, first 
observe that, if X k = J2 i=1 X ki Vi =: v'X fc , then 

d d 

C% = EX h <g> X = EX hiX 0j Vi ® vj. 

*=i i=i 

Let := E'X/jXq. Then, by Lemma [l] (iii), we get £fi = C^V as the correspond- 
ing matrix of , and by the linearity property (i), the corresponding matrix of 
is 



Zt = % r \>CZ<- lhH \V'- (12) 

Assume that A m (0) is the m-th largest eigenvalue of 3# , with eigenvector ip m (9). 
Then X m (&) is also an eigenvalue of J 7 ^ and v'(f m (9) is the corresponding eigenfunc- 
tion, from which we can compute, via its Fourier expansion, the dynamic FPCs. In 
particular, we have 

v' r _.. , 

and hence 

Y m t = ^2 ^t-kv(u)v'(u)(t> mk du = ^2x.' t _ k V(t> mk . (13) 

fcez ^° fcez 
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3.2. Estimators 



In view of (12), our main task is to replace the spectral density matrix 



r X. 1 V" r .X -ihH 



of the coefficient sequence (X&) by some estimate. For this purpose, we can use 
existing multivariate techniques. Classically, we would put for \h\ < n 



Cf :=- X * X U, h > 0, and Cf := C* , h < 0, 
n z — 4 

k=h+l 

(recall that we throughout assume that the data have been centered) and use, for 
example, some lag window estimator 

:= h £ ™(h/q)C*e- iM , (14) 

71 \h\<q 

where w is some appropriate weight function and q = q n — > 00. We refer to Chap- 
ters 10-11 in Brockwell and Davis [6] or to Politis |27j . In the examples of the 
following sections we shall use the Bartlett kernel w(x) = 1 — \x\. We then set 
:= Tf'V and compute the eigenvalues and eigenfunctions \ m (9) and <p m (9) 
thereof, which serve as estimators of X m (6) and ip m (9), respectively. We estimate 
the filter coefficients by <p m k = J_ ip m (s)e lks ds. Usually, no analytic form of 
(p m {s) will be available, and one has to perform numerical integration. One may 
use, for example, 



Ng 

= 9 ( ol ■ n E ^J/Ne)e iks =: v'0 mfc , (N e > 1). 
2n{2N e + 1) ^ 



j=-N e 

Since in our sample we only observe X\, . . . ,X n , we cannot just substitute <p 
into (13). For example, one may define 



mk 



L 

Y mt = X't-kVtmk, te{L + l,...,n-L}. (15) 

k=-L 

In this case we loose the first L and the last L observations of our sample. Such 
boundary problems for moving averages are well known in time series analysis (e.g., 
for exponential smoothing) and can be partly remedied with properly weighted sums. 
A simple solution for obtaining Y mt when 1 < t < L or n - L + l<t<nisto set 
X_ L+1 = ■■■= X = and X n+l = ■■■ = X n+L = 0. 

With m fc defined above, along the same line of argumentation as before, we obtain 
a p-term dynamic Karhunen-Loeve expansion 

V L 

X t = J2J2 Ym,t+k4>mk, te{2L + l,...,n- 2L}. (16) 

m=l k=—L 
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Parallel to (11 ), the proportion of variance explained by the first p dynamic FPCs 
can be estimated through 



N e n 
m<p j=—Ng k=l 

Alternatively, we may use 1 — PVd yn (p) or (see Proposition [5]) the normalized mean 
squared errors 

n-2L n-2L 

NMSE(p,L): = £ \\X k -X k \\ 2 / H Xfc H 2 ( 1T ) 

fe=2L+l fc=2L+l 

as measures for the loss of information when considering a p term dynamic KL 
expansion. Notice that 1 — PVd yn (p) and NMSE(p, L) will, in general, not coincide. 
The latter depends on L and, from this point of view, it may look less practical than 
1 — PVdyn(p)- On the other hand, the determination of and also depends on 
the choice of L, and so NMSEQo, L) is a more 'honest' estimate which we thus 
recommend. 



3.3. Complexity 



The practical implementation of dynamic functional principal components comes 
along with a number of calculations. In this section, we shall summarize the numer- 
ical costs and compare them with those needed for the computation of static FPCs. 
Of course, efficiency and quality of algorithms play an important role in this con- 
text, and the numerical complexity we provide is related to those algorithms we used 
and implemented in our simulation study (Section [4]) and the real data application 
(Section [5]). 

In Table [TJ we list the parameters on which the computation time depends. 



n 
d 

N e 

Q 
L 

V 



sample size 

number of basis functions used to represent curves 
number of integration points 9 G 



-7T, 7T 



lag window size in ( 14 ) 



truncation level for filters in (15) 



number of dynamic FPCs to be computed 



Table 1: Parameters involved for computing DFPCs. 

Table [2] displays the building blocks required for the computation of dynamic 
FPCs, along with the numerical complexity (number of summations, multiplications 
and storing) involved. All these quantities have to be computed over a set of different 
parameter values, which introduces an additional multiplicative factor, shown in the 
last column of Table [2j Objects obtained in step i — 1 are stored and can be used 
for step i, for 2 < i < 5. The computation time CTd yn for the dynamic procedure is 
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thus 0(d x [ndq + dqNg + pd 2 Ng + LpNg + Lnpd]). In comparison, the computation 
time CT stat for static FPCA is 0(d x [nd + pd 2 + rip]). In practice, q and L will 
be adapted to the sample size n. From our computational experience, we would 
recommend to put q = 0{^Jn) and L = 0(^/n), while p is usually small and fixed. 
Then we have CTd yn = 0(d 2 x max{n 3 / 2 , y/nNg, dNg}). One may conclude that 
CT dyn < max{y/n, iV e }CT stat . 

step object complexity multiplicity 



1 

2 
3 
4 
5 



u h 


0(nd 2 ) 


q 


s e 


0(d 2 q) 


N e 




0(d 3 ) 


pN ( 


4>mk 


0(N e d) 
0(d 2 L) 


pL 


y 

1 mk 


pn 



Table 2: Computation complexity for obtaining the different objects required in our 
procedure. These quantities have to be computed over a set of different 
parameter values, the impact of which is reflected in the multiplicative 
factors shown in the third column. 



4. Simulation study 

In this simulation study, we compare the performance of dynamic PCA with that 
of static PCA as follows. For a given time series {X t ), we perform a static and 
a dynamic FPC analysis. From the resulting scores, we recover two functional 
series (Xf at ) and (X? yn ) using the static and dynamic Karhunen-Loeve expansions, 
respectively. Performance is then measured in terms of the respective normalized 
mean square errors E\\X t -Xf at || 2 / E\\X t \\ 2 and E\\X t - Xf yn \\ 2 /E\\X t \\ 2 . The latter 
quantity can be estimated by NMSE(p, L) or by 1 — PVd yn (p)- For the static FPCA, 
we use the estimate 1 — PV sta t(p), where PV s t a t(p) is the proportion of variance 
explained by the first p static FPCs. 

For computations we employed the statistical software R along with the f da pack- 



age. The data was represented as discussed in Section [37T| using Fourier basis func- 
tions (uj: 1 < i < d), where d = 5,11,21. We then set H = sp(fj: 1 < i < d). In 
each run we sample a matrix \1/ with i.i.d. standard normal entries and normalize it 
to = k, where K — 0.1, 0.3, 0.6, 0.9. This matrix is then used as the correspond- 
ing matrix of an operator (with slight abuse of notation, we denote the operator also 
by on H. With the operator we generate 400 observations from an autoregressive 
Hilbertian process of order 1, defined by X t = \l/(X t „i) + e t . The noise (et) is i.i.d. 
Gaussian, obtained as a linear combination of the functions (vi : 1 < % < d) with 
i.i.d. standard normal coefficients. 

Efficiency of our method also relies on the estimation of the spectral density 
operator. We follow the methodology introduced in Section [3] and use a Barlett 
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kernel and q = 20 in (14). The numerical integration for obtaining tp m k is performed 
on the basis of 400 equidistant integration points. We test truncation levels L = 
5, 10, 15, 20, 25 for the filters in (JTsJ) . 

For each choice of d and L, the experiment was repeated 200 times. Results are 
presented as boxplots in Figure [2j The dashed lines correspond to the average of 
1 — PVdyn(p), P — 1, 2, 3, while the solid lines correspond to 1 — PV stat (p). 

We see that in this sense dynamic FPCA quite significantly outperforms static 
FPCA. As one can expect, performance increases with the dependence coefficient n. 



5. Illustrative data example 

In this section, we compute and interpret the first dynamic FPC score sequence 
of daily air pollution curves and draw a comparison with its static counterpart. 
The observations are half-hourly measurements of the concentration (measured in 
figm~ 3 ) of particulate matter with an aerodynamic diameter of less than 10/xm, 
abbreviated as PM10, in ambient air taken in Graz, Austria from October 1, 2010 
until March 31, 2011. Following Stadlober et al. [33J, a square-root transformation 
was applied to the data in order to stabilize the variance and avoid heavy-tailed 
observations. The data have been already explored in Aue et al. jl] in the context of 
curve prediction. Following their approach, we remove some outliers and a seasonal 
(weekly) pattern coming from different traffic intensities on business days and week- 
ends. Then we use the software R to transform the data to functional data. The 
only difference in our approach is that we use 15 Fourier basis functions instead of 
b-splines. This simplifies our computations, but easily can be changed. Eventually, 
175 daily functional observations, say, X\, . . . ,X 175 , were obtained, roughly repre- 
senting one winter season for which pollution levels are known to be high. They are 
displayed in Figure [3j 

Next we compute the (estimated) first dynamic functional PC score sequence 
(y^ yn : 1 < t < 175). To this end, we first center the data by their empirical 
mean (i(u) and then follow the procedure introduced in Section 13} in particular, we 



set g = 15 in (14) and use the Barlett kernel to obtain an estimator for the spectral 



density operator. From this, we obtain the estimated filter elements <\>\ t - Since ||0it| 



seems to converge to zero fast, we simply set L = 10 in ( 15 ). To obtain DFPC scores 
K lt yn for the case 1 < t < 10 and 166 < t < 175 we set x_g = • • • = Xq = ft and 
x i76 = • • • = Xi85 = /t. The corresponding time series (Y^ yn : 1 < t < 175) is shown 
in Figure |4j We shall focus here on one component only, since the first dynamic 
FPC already explains 80.2% of the variance. This should be compared to 73.8% 
explained by the first static FPC. 

Figure 4] shows that the static score sequence (Y^ at : 1 < t < 175) is almost 
identical to the dynamic one. This is remarkable, as they have been computed 
from quite different methods. To get some interpretation, let us analyze the first 
static sample FPC Vi(u), say, and the DFPC filters <p lt (u). They are displayed 
in Figure p We see that Vi(u) ~ 1 for all u E [0,1], and hence the FPC score 
Y 1 s * at = J (X t (u) — fi(u))vi(u)du roughly is the average deviation of X t {u) from 
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k = 0.1, d = 5 



i * * 



* * * 



K = 0.l,d = n 



V: 



K = 0.1,d = 21 



10 15 20 25 
L 



5 10 15 20 25 
L 



10 15 20 25 



k = 0.3, d = 5 k = 0.3, d = 1 1 k = 0.3, d = 21 




Figure 2: Boxplots: NMSEQo, L) with L = 5, 10, 15, 20, 25, where for each L red, 
orange and yellow boxplot corresponds to p — 1,2,3, respectively. Solid 
lines (red, orange and yellow): 1 — PV s t a t(p) with p = 1, 2, 3, respectively. 
Dashed lines: 1 — PVd yn (p) with p = 1,2, 3, respectively. 
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Figure 3: We display Xt(u), 1 < t < 175, where Xt(u) are the square-root transformed 
and d etrended daily functional observations of PM10, represented with 15 
Fourier basis functions. The solid black line represents the sample mean 
curve fi{u). 

the mean. The effect of a large (small) first score corresponds to a large (small) 
daily average of VPM10. In view of the similarity of Y l / n and Y 1 s t tat , it is possible 
to attribute the same meaning to the dynamic FPC scores. However, regarding the 
dynamic KL expansion, dynamic FPC scores should be interpreted sequentially and 
not in a static way. To this end, let us take advantage of the fact that all functions 
0it, \t\ > 1, are close to zero (see Figure ^ and thus, in the approximation by a 
single-term dynamic KL expansion, we roughly have 

X t (u) « fi{u) + + Y«rho(u) + Ytjl^u). 

This suggests to study the effect of triplets (Xxt^ii Yi/ n , Y± J+i) °f consecutive scores 
on the pollution level of day t, which can be done by adding the functions 

eff (<*!, 8 2 , S 3 ) : = _i(m) + 5 2 4>ifl(u) + 5 3 4> ltl (u), Si = const x ±1, 

to the overall mean curve fi(u). We do this in Figure [6] with §i = ±1. For instance, 
the upper left panel shows fi(u) + eff (— 1, —1, —1); this corresponds to the effect of 
three subsequent small DFPC scores. Not surprisingly, they result in a negative 
shift of the mean curve. The second panel from the left in top row shows (i(u) + 
eff (— 1, — 1, +1). The picture is similar as before, but now the level increases as u 



16 




50 100 150 

days 



Figure 4: The sequence of the first static FPC scores (red) and the dynamic ones 
(black) . 

approaches 1. This has a simply explanation. A large value of Y x implies a large 
average concentration of yPMlO on day t + 1, and since the pollution curves are 
highly correlated at the transition from day t to day t + 1, this should indeed be 
reflected by an increase of y/PKlO towards the end of day t. By the same line of 
argumentation, it becomes clear why the pollution level is low for the rest of the 
day. Similar interpretations can be given for the other panels of Figure [6j 

It is interesting to observe that, in this example, the first dynamic FPC seems to 
take the role of the first two static FPCs. The second static FPC (see the left panel in 
Figure |5| can be interpreted as an intraday trend effect; if the second static score of 
day t is large (small), then X t (u) is increasing (decreasing) over u G [0, 1]. However, 
since we are working with sequentially dependent data, we can get information about 
such a trend from future and past observations, too. This is exemplified in Figure [T] 
of Section[T} It shows the ten consecutive curves £71 (w) — jl(u), . . . , Xso(u) — (left 
panel) and compares them to the single-term static (middle panel) and the single- 
term dynamic KL expansion (right panel). We see that the dynamic KL version not 
only recovers the level, but also the intraday trend. 
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Figure 5: First static FPC Vi(u) (solid line left), and second static FPC #2(1*) 
(dashed line left), and the filters corresponding to first dynamic FPC 
(right). The filters <f>i t (u), t > 1, are dashed and the filters (f)u(u), t < 0, 
are solid. The larger \t\, the lighter the curve. 



6. Conclusion 



Functional principal component analysis is taking a leading role in the functional 
data literature. As an extremely effective tool for dimension reduction, it is useful 
for empirical data analysis as well as for many FDA-related methods, like functional 
linear models. A frequent situation in practice is that functional data are observed 
sequentially and are serially dependent. For example, this occurs when observations 
stem from a continuous time process which is segmented into smaller units, e.g., 
days. In such cases, classical static FPCs still can be consistently estimated, but, in 
contrast to the i.i.d. setup, they will not lead to an adequate dimension reduction 
technique. 

In this paper, we have proposed a dynamic version of functional PC A which 
takes into account a potential serial dependence of the functional observations. In 
the special case of uncorrelated data, the dynamic methodology reduces to the usual 
static FPCA. We have complemented the methodology with (i) practical guidelines 
for implementation, (ii) simulations, (iii) a toy example with PM10 pollution data 
and (iv) a rigorous mathematical theory, including some asymptotics. Our empirical 
work shows that dynamic FPCs have a clear edge over static FPCs in terms of their 
ability to represent dependent functional data in small dimension. While we have 
presented the method for functional (L 2 -valued) data, our proofs are general and 
cover the theory for separable Hilbert spaces. 



18 



(SiA,8a) = (-1,-1,-1) (61,82,83) = (-1,-1,+1) (81,82,83) = (-1 ,+1,-1) (6,,82,83) = (-1,+1,+1) 




Intraday time Intraday time Intraday time Intraday time 



0.0 0.2 0.4 0.6 0.8 
Intraday time 



0.0 0.2 0.4 0.6 0.8 
Intraday time 



0.0 0.2 0.4 0.6 0.8 1.0 
Intraday time 



0.0 0.2 0.4 0.6 0.8 1.0 
Intraday time 



Figure 6: Mean curve fi(u) (solid line) and fi(u) + eff (Si, 6~ 2 , S 3 ) with 5i = ±1. 



A. General Methodology and Proofs 
A.l. Hilbertian Framework 

In this subsection, we give a mathematically rigorous description of the method- 
ology introduced in Section 2.1 We adopt a more general framework which can 



be specialized to the functional setup of Section 2A_ Throughout H denotes some 
(complex) Hilbert space. We work in complex spaces, since our theory is based on a 
frequency domain analysis. Nevertheless, all our functional time series observations 
X t are assumed to be real-valued functions. A crucial structural assumption that we 
impose is that H is separable, i.e. possesses a countable orthonormal basis (ONB). 

Linear operators. We consider the class C(H, H') of bounded linear operators be- 
tween two Hilbert spaces H and H' . With a slight abuse of notation, we use || • || 
and (•,•), for norm and inner product on both, H and H'. For ^ G C(H,H'), 
the operator norm is defined as \\^\\c '■= su P||a;||<i 11^0*0 II- The simplest opera- 
tors can be defined via a tensor product v (g) w; then v <g> w(z) := v(z,w). Ev- 
ery operator \l/ G C(H, H') possesses an adjoint G C(H', H) which satisfies 
($(x),y) = (x,^*(y)) for all x G H and y G H' . It holds that \\^*\\c = ||*|U- If 
H = H', then \l/ is called self-adjoint if ^ = It is called non-negative definite if 
(\E% x) > for all x G H. 

A linear operator \l/ is said to be Hilbert- Schmidt if we have H^H! := Sfc>i ll^^fc)!! 2 
00 for some ONB (vk : k > 1) of H. Then \\^\\s defines a norm, the so-called Hilbert- 
Schmidt norm of It bounds the operator norm: \\^\\c — \\^\\s, an d can be shown 
to be independent of the choice of the ONB. Every Hilbert-Schmidt operator is 
compact. The class of Hilbert-Schmidt operators between H and H' defines again 



< 
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a separable Hilbert space T-L with inner product (\&, *&')s '■= J2k>i(^( v k)i ^'( v k))- 

If * G £(#, F') and T G £(#", #) then is the operator which maps x G H" 
to \l/(T(x)) G iiT. Assume that \1/ is a compact operator in C(H, H') and let (s 2 ) be 
the eigenvalues of (\I/*)\I/. Then ^ is said to be trace class if H^Hr := Sj>i s i < 00 • 
In this case H^Hr defines a norm, the so-called Schatten 1-norm. We have \\^\\s < 
II^IItj an( i hence any trace-class operator is Hilbert-Schmidt. For self-adjoint non- 
negative operators, it holds that ||\I>|| r = tr(\&) := ^2 k>1 (^(v k ),v k ) . If = \1>, 

then we have tr(\l/) = H^HI- 

For further background on the theory of linear operators we refer to [12j. 

Random sequences in Hilbert spaces. All random elements that appear in the sequel 
are assumed to be defined on a common probability space (Q,A, P). We write 
X G L p H (n,A,P) (in short, X G L P H ) if E\\X\\ P < oo. Every element X G L l H 
possesses an expectation, which is the unique fi G H satisfying E(X,y) = (/i, y) 
for all y G H. Provided X and Y are in L 2 H) we can define the cross-covariance 
operator as Cxy '■= E(X — fix) <E> (Y — fiy), where fix and fiy are the expectations 
of X and K, respectively. We have that ||Cxy||t < -^11 (A" — /ix) ® {Y — fiy)\\r — 
E\\X — Hx\\ \\Y — fiy ||, and so these operators are trace-class. An important specific 
role is played by the covariance operator Cxx- This operator is non-negative definite, 
self-adjoint with tr(Cxx) = E\\ X— /ix|| 2 - We call an iJ-valued process (X t ) (weakly) 
stationary, if (X t ) G L 2 H and if EX t and Cx t+h x t do not depend on t. In this case, 
we write C^, or shortly Ch, for Cx t+h x t if it is clear to which process it belongs. Two 
weakly stationary processes (X t ) and (Y t ) are called costationary if Cx t+h y t does not 
depend on t. Then we write Cff Y for the covariance operator Cx t+h y t - 

Many useful results on random processes in Hilbert spaces or more general Banach 
spaces are collected in Chapters 1 and 2 of [7]. 

Fourier series in Hilbert spaces. For p > 1, consider the space L P H ([— n, n]), that is 
the space of measurable mappings x : [— tt, 7r] — > H which satisfy f_ \\x(8) \\ p d8 < oo. 
For p = 2 this space is again a Hilbert space, with inner product 

(x,y) :=±- f* (x(9),y(9))de 

and norm ||a;||2 = \/ (x, x). One can show (see e.g. [7, Lemma 1.4]) that, for any 
x G L l H {[— 7r, 7r]), there exists a unique element I(x) G H which satisfies 

(x(9) 1 v}d9 = (I(x),v) Vf G H. (18) 

Then we define J^ n x(9)d9 := I(x). 

For x G L 2 H ([— 7r, 7r]) we can now set the k-th Fourier coefficient equal to 

h ■= ^ x(^)e- ifce ^, k G Z. (19) 

Below we write for the function 9 h- >■ e lfcei , G [— 7r, 7r]. 
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Proposition 6. Suppose x G L 2 H ([— n, tt]) and define fk by equation (19). Then, the 
sequence S n := YM:=-nfk e k has a mean square limit in L 2 h ([—tt,tt]). If we denote 
the limit by S , then x(9) = S(9) for almost all 9. 

Proof. Let < m < n and notice that 

\\S n — SmW 2 , — ( fket, ^2 ff- e A 

m<\k\<n m<\l\<n ' 

m<\k\<n m<\£\<n m<\k\<n 

To prove the first statement, we need to show that (S n ) defines a Cauchy sequence in 
L 2 H (\— 7r, 7r]), which follows if we show that Y2kez \\fk\\ 2 < °°- We use the fact that, 
for any v G H, the function (x(9),v) belongs to L 2 ([— it, 7r]). Then, by Parseval's 
identity and (18), we have, for any v G H, 

±- f \(x{9),v)\ 2 d9 = Y,(h ^ (<s),v)e- iks ds) 2 = J2\(fk,v)\ 2 . 



Let (vk' k > 1) be an ONB of if. Then, by the last result and Parseval's identity 
again, it follows that 



\Qtj Q 

1 112 2tt 



EEk/*.^>i 2 = Eiimi 2 - 



£>i fcez fcez 



As for the second statement, we conclude from classical Fourier analysis results 
that, for each v G H, 



Urn ^ / ( (.r{9).r)- (— / - e ife< 7i,s ) e" :fe0 ) ^ 0. 



Now, by definition of £>„, this is equivalent to 

lim / (x(#)-S„(#),t;) 2 ^ = 0, Vi;GE 

n->oo 27T > /_ 7r 

Combined with the first statement of the proposition and 

/IT 
(x(9)-S n (9),v) 2 d9 
■TT 

+ 2|M| 2 r \\S n (9) - S(9)\\ 2 d9, 

J — TT 

this implies that 

— f (x(9) - S(9),v) 2 d9 = 0, \fv G if. (20) 
2vr J.^ 
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Let (vi) be an ONB of H and A, := {6 G [-7T, tt] : (x(9) - S(9),Vi) ^ 0}. By gj, 
we have that A(Aj) = (A denotes the Lebesgue measure), and hence X(A) = for 
A = Uj>iv4j. Consequently, since (vA define an ONB, for any 9 G [— 7T, 7r]\A, we have 
(s(6 ) ) — S{6),v) = for all v £ H, which in turn implies that x{6) — S{6) = 0. □ 

A. 2. On the Spectral Density Operator 

Assume that the if- valued process (X t : t G Z) is stationary with lag /i autocovari- 
ance operator Ch and spectral density operator 



In order to guarantee convergence of this series, we tacitly impose assumption Q 
throughout this section. It can be easily seen that the operator J 7 ^ is self-adjoint, 
non-negative definite and Hilbert-Schmidt. Below, we introduce a weak dependence 
assumption established in [17J from which we can derive a sufficient condition for 

©■ 

Definition 3 (L p -m-approximability). A random H -valued sequence (X n : n G Z) 
zs called L p -m-approximable if it can be represented as 

X n = f(6 n , 8 n -i, S n -2, ...) 

where the Si 's are i.i.d. elements taking values in some measurable space S and f is 
a measurable function f : S°° — > H. Moreover, if 5[, 5' 2 , ... are independent copies of 
61,62, ■■■ defined on the same measurable space S, then, for 

■^n ^ - = f{6ni6 n -i,6 n -2, ■■■,6 n - m +i,6 n _ rn ,6 n _ m _i, ...), 

we have 

00 

Y,(E\\X m -X^r) 1/p <oo. (21) 

771=1 

Hormann and Kokoszka p2] show that this notion is widely applicable to linear 
and non-linear functional time series. One of its main advantages is that it is a purely 
moment-based dependence measure that can be easily verified in many special cases. 

Proposition 7. Assume that (Xt) is L 2 -m-approximable. Then Q holds and the 
operators J 7 ^ , 9 G [— 7r, tt], are trace-class. 

Instead of Assumption Q, Panaretos and Tavakoli |26j impose for the defini- 
tion of a spectral density operator summability of Ch in Schatten 1-norm, i.e. 
^2hez WChWr < 00 • Under such slightly more stringent assumption, it immedi- 
ately follows that the resulting spectral operator is trace-class. The verification of 
convergence may, however, be a bit delicate. At least, we could not find a simple 
criterion as in Proposition [7} 
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Proof of Proposition [?| Without loss of generality, we assume that EXq = 0. By 
independence of Xq and X^\ h > 1, we have 

\\c h \\s = \\EX ®(x h -xM)\\ s < (£||x ||W^II*fc-4 h) ll 2 ) 1/2 - 

The first statement of the proposition follows. 

Fix 6. Since J-^ is non-negative and self-adjoint, it is trace class if and only if 

tr(jf ) = ^(T?{v m ),v m ) < oo (22) 

m>l 

for some ONB (v m ) of H. The trace can be shown to be independent of the choice 
of the basis. Define V n ,e — (27rn)~ 1//2 5^fc=i ^k^ ke and note that, by stationarity, 

JPf : = ® V n , e = V ( 1 - —)eX ® X^ h e- ihe . 

|ft[<n V 7 

It is easily verified that the operators J 7 ^ again are non-negative and self-adjoint. 
Also note that, by the triangular inequality, 

W^nfi - ?f \\s < E ~~\\Ch\\s + E \\Ch\\s- 
\h\<n \h\>n 

By application of (|4]) and Kronecker's lemma, it easily follows that the latter two 
terms converge to zero. This implies that J r nei v ) converges in norm to J-^iv), for 
any v £ H. 

Choose v m = (p m (8). Then, by continuity of the inner product and the monotone 
convergence theorem, we have 

Gftn(0)),vn»(0)> = E Kuv(jFx e {ip m {e)),y m {e)) 

' * ' • n— too 

m>l m>l 

= lim X)<J&fat»(0)),<Pm(0)>- 
m>l 

Using the fact that the F^e^ are self-adjoint and non-negative, we get 
Y,^n,e^m{e))^ m {0)) = tr(^) = E\\V n \\ 2 

m>l 

1 / \U\\ 

-ih6 



2tt 

|ft|<n 

Since \E(X , Xh}\ = \E(X ,Xh — X^)\ : by the Cauchy-Schwarz inequality, 

E i^o,^ h >i < E(^ii x oii 2 ) 1/2 (^(^ - 4" } ) 2 ) 1/2 < oo, 

hez hez 
and thus the dominated convergence theorem implies that 

/iGZ heZ 



□ 
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The eigendecomposition of gives 

m>\ 

where \i{0) > ^(O) > ■ ■ ■ are the eigenvalues and (p m (9) the corresponding eigen- 
functions of Tf . We require ||<y2 m (#)|| = 1 and hence, if A m (6 l ) has multiplicity 1, 
then <fim{9) is unique up to some rotation e luJ , u G [— 7r,7r]. Let x be the conjugate 
element of x, i.e. (x, z) = (z, x) for all z G H. Then x is real-valued iff x = x. 

Proposition 8. Let be the spectral density operator of the stationary sequence 
(X t ) for which the summability condition Q holds. Let X\{9) > \2(0) > •■■ 
denote its eigenvalues and ip m (9) be the corresponding eigenf unctions. Then, (a) 
the functions 9 i— > X m (9) are continuous; (b) if we strengthen condition Q to 
Y^heZ \^\\\^h\\s < the X m (9)'s are Lipschitz- continuous functions of 9; (c) as- 
suming that (X t ) is real-valued, for each 9 G [— tt, tt], X m (9) = X m (—9) and <p m (9) = 
<f m (-9). 

Proof. We have (see e.g. [12], p. 186) that the dynamic eigenvalues satisfy \X m (9) — 
X m (9')\ < Wrf-FfWs- Now, 

ii^-^ii5<Eii^ni e-iW - e-iW 'i- 

The summability condition Q implies continuity, hence part (a) of the proposition. 
Using \e- ihe - e- iM '\ <\h\\9-9'\ yields part (b). 
To prove (c), we observe that, for any 9 G [— tt, tt], 

\ n {0)v m {Q) = T?fr m (p)) = Y^EX h (ip m {9),X,)e- ihe . 

Ztx z — ' 

Since the eigenvalues X m (9) are real, we obtain, by computing the complex conjugate 
of the above equalities, 

X m {0ypJ9) = i- EX h (^(9),Xo)e iM = F* e &j0)). 



This shows that X m (9) and <f m (9) are eigenvalue and eigenfunction of J 7 ^, and they 
must correspond to a pair (A n (— 9), f n {— 9)); (c) follows. □ 

Remark 1. The eigenf unctions (p m (9) are unique up to multiplication with a number 
lying on the complex unit circle. Writing (p m (9) = tp m (—9) more precisely means that 
l Pm(9) = e 1U) ip m (—9) for some u G [— tt,tt]. 

Since ||^ m (#)|| 2 — 1 5 we have that ip m G L 2 H (\— tt, it]). Hence, we can expand it in 
a Fourier series in the sense explained in the previous section: 



1 F* 

4>mtee, where 4> me = — J f m (s)e~ lis ds. 



The coefficients <f) m £ thus defined give rise to the definition of the dynamic FPCs as 
in @. 



Remark 2. Since <p m (9) is Hermitian, it immediately follows that (j) m e = (J) m g, im- 
plying that the dynamic FPCs are real if the process (X t ) is real. 
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A. 3. Functional filters 

Computation of dynamic FPCs requires applying time-invariant functional filters 
to the process (X t ). Let = fi?k'- G Z) be a sequence of linear operators, 
each mapping between separable Hilbert spaces H and H'. Further, let B be the 
backshift or lag operator, given by B k X t := X t _k, fcGZ. Then the functional filter 
ty(B) := J2kez ^kB k , when applied to the sequence (X t ), produces an output series 
(Y t ) in H' via 

Y t = ^{B)X t = M*t-k). (23) 

k&L 

We call V& the filter coefficients, and, in the style of scalar or vector time series, we 
call the mapping tyg : [— tt, tt] — > C(H, H') with 

the frequency response function of the filter ^(B). 

Proposition 9. Let^(B) be a functional filter with coefficients satisfying YlkezW^^Wc < 
oo, and let (Y t ) be given as in (23). Then, 



(a) if (Xt) G L 2 H , the series (23) converges in L 2 H ,; 

(b) the sequence (Y t ) is stationary with autocovariance operator 

feez £ez 

Proof. For (a), we need to show that S n>t = Y^k=- n ^k(Xt-k) is a Cauchy sequence. 
If m < n, we get, by application of the Cauchy-Schwarz inequality, 

E\\3 n , t -S mtt \\ 2 = E E(y k (X t „ k ),V e (X t _ £ )) 

m<\k\<n m<\£\<n 

^ ii^iuii^iu^ii^t-fciiii^ii, 



m<\k\<n rn<\e\<n 



and thus 



E\\S nt t — S mt t\\ 2 < I 2^ \\^k\\c) E\\X[ 

^|fc|>m ' 

which goes to zero as (m, n) — > oo. 

To establish (b), first remark that, for two sequences (Z n ) and (Z' n ) in L 2 H with 
E\\Z n - Zf -»■ and E\\Z' n - Z'f -»■ 0, we have that 

\\EZ n ®Z' n -EZ® Z'\\s < E\\Z n ®Z' n -Z® Z'\\ s 

= E\\{Z n -Z)®Z' n -Z®{Z'-Z' n )\\ s 

< E\\Z n - ZfE\\Z' n f + E\\Z\\ 2 E\\Z' - Z'J 2 , 
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and hence \\EZ n ® Z' n - EZ ® Z'\\ s -)■ 0. Observing moreover that E^{Z)®T{Z') 
^Czz 1 ^* i the result follows from the fact that 

Cl = lim ES n , t ® ^t-h = lim V V EV k (X t _ k ) ® ^(X t _^). 

|fe|<n |£|<n 



□ 



Proposition 10. Let (X t ) G and (X t ') G L# 6e £u>o costationary processes with 
^2hez \\Ch X \\s < oo and define their cospectrum as Tf x := (27r) _1 X^/iez C*/f X e~ lM . 
Set Y t := ty(B)Xt and Y( := T(B)X' t , where V(B) and T(B) are functional fil- 
ters with coefficients satisfying the summability conditions ^2 keZ \\^k\\s < oo and 
Tlkez ll^fclU < °°> respectively. Then (Y t ) and (Y[) are again costationary and 
E, e z WCr'h < oo. Furthermore, Fj Y ' = ^o^ XX 'T* e , where T* e : = E«T^e iM . 

Proof. It is easy to see that (Y t ) and {Y{) are costationary. Similar as in Proposi- 
tion [9j we infer that 

yi w c h Y \\s < y^ yiyi ii^fciuii^iisiic^-js. 

hez hez fcez fez 

Then, using ||T*||5 = ||T||s and summing first over h yields 

£ ncr n« < (£ (£ m.) E < »■ 

fcez ^ fcez ' \ fcez ' hez 

Hence the operator is well defined, and we have 

hez hez \ ^ fc 

- ^E fEE^iX^^T?^-*)^ 

/iez \ i k J 

= 2^ EE** (E^^^'I TJe^-** 

as was to be shown. □ 
Corollary 1. Let (^ k ) be a functional filter such that ^2 keZ \\^k\\s < oo and let (Y t ) 



'k { ^e- k+ h L e i e 



be given as in (23). Assume that (X t ) G L 2 H is stationary with Ylhez \\C X \\s < oo. 
Then J2 h& \\clh < oo and 7 Y e = V e F?%. 

A. 4. Proofs for Section [2] 

We start by observing that Proposition [T] follows directly from Corollary [Tj Part (a) 
of Proposition [2] also has been established in the previous Section (see Remark [2]). 
Part (b) is immediate, and thus we can proceed to the proof of Proposition |3] 
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Proof of Proposition 3| To prove Part (a) one can proceed along the lines of the 
proof of Proposition 9j making use of Q and 



k&L 71 ^~ 7r 



Then one shows that the partial sums Y!ik=-n(Xt-k, <fimk) form a Cauchy sequence 
in L 2 norm, by using 

\E{X t -k, <t>mk) {Xt-li 4>ml) \ = | (Ce-k((j>me), 4>mk) | < || Ctf-fc || C || 4>ml || || 4>mk \\ ■ 

Parts (b) and (c) are immediate consequences of Proposition [T] (and Corollary [T] 
for the general setup). □ 

Proof of Propositions^ and\5\ Assume we have filter coefficients = : fceZ) 
and T = (T fc : k G Z) where # fc : H C p and T fc : C p -»■ if. If (Jf t ) and (Y t ) are 
if-valued and C p -valued processes, respectively, then there exist elements ip mk and 
v m k in H, such that 

*(B)(X t ) = ((Xt-k, 4>ik), • • • , (Xt- k , 4> pk ))' 

and 

p 

m v mi- 

£ £ Z m=l 

Hence, the p-dimensional reconstruction of X t in Proposition [5] is of the form 

p 

Y,X m t = T(B)[*(B)X t } =: TV{B)X t . 

m=l 

Letting i[> m (9) = Y^k&^rnk^ ke and v m (9) = J2eez v mee ue , we obtain, for x G H and 
y = (yi, . . . , y m )' G C p , that the frequency response functions and Yg satisfy 

= E ^*>, • • • ' w)' e ~ ifce = ^ (*)>> • • • ' ^)»' 

fcGZ 

and 

p p 



m=l m=l 

Consequently, 

p 

T e tt* = 5^ m (-0)<g>Vm(0)- (24) 



m=l 



t; 



Now, using Proposition 10, it can be readily verified that, for Z t := X t — T^f(B)X- 
we obtain the spectral density operator 

jf = Tf + T e ^e^%r* e - TeVerf - rf%T* e (25) 
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it follows that the 



where Tf is such that T^Tf = . Also, from Proposition 10 
autocovariances of (Z t ) are summable in Hilbert-Schmidt norm, that is, Ylhez W^hWs < 
oo. We can conclude that the integral J_ J-^dd exists and is equal to EZ t eg) Z t 
Therefore, 



E\\X t - rm{B)X t \\ 2 = tr (E[Z t ® Z t \) 



tr 



TZ&Q 



x 



tr(jf ) dO 



d6. 



(26) 



For the sake of rigor, let us justify that we can interchange above the trace and the 
integral. To this end, note that f_ T^dQ = \J zZ if and only if 



(IF Z , V) s 



(T?,V) s d9, 



(27) 



for all V in the space of Hilbert-Schmidt operators on H. From some ONB (v^) 
define Vn = ^2 k=1 f& ® v k - Then (27) implies that 



N 



tr(LF z ) = lim \2(lF z (v k ),v k ) 



N-+oa 

k=l 

lim (IF*,V n ) s = lim / (F?,V n ) s d0 

N 



lim / Y,(tf(v k ),v k )d0. 



Since J 7 ^ is non-negative definite for any 0, the monotone convergence theorem 
allows to interchange the limit with the integral. 

Now, (26) is minimized if we minimize the integrand for every fixed 9 under the 



constraint that Tq^q is of the form (|24j). Employing the eigendecomposition 



m>l 



we infer that 



The best approximating operator of rank p to is the operator 



m=l 



It is obtained if we choose T e \l/0 = J^m=i fm{9) <S> ¥? TO ($) and hence 

^m(^) = <Pm(0) and u m (0) = <p m (-0). 
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Consequently, by Proposition [6] we get 



1 f w 1 

V'mfc = TT / ^m(s)e" lfcs c/s and v mk = — 

All ]_„ ZTC 



if m {-s)e lkS ds = lpm-k- 



With this choice, it is clear that T^(B)X t = z~2m=i X m t- Condition ([8]) assures that 
the involved series are mean square convergent. Hence 

P pir 2 i* n 

E\\X t -J2 X tm\\ 2 = / ff-j=f{v) d6= ^A m (fl)d0; 

m=l ^~ 7r 5 •■'- n m>p 

the proof of Proposition [5] follows. 

Turning to Proposition |4j observe that by the monotone convergence theorem, 
the last integral tends to zero if p — > oo, which gives the proof of Proposition |4} □ 



B. Large Sample Properties 

In this appendix, we study the consistency of the estimated dynamic FPC scores. For 
the sake of a neat and compact theory, we shall put aside the computational aspects 
treated in Section[3j More precisely, we assume that we have fully observed functional 
data and that all complicated computation (like integration, eigendecomposition, 
etc.) can be performed with arbitrary precision. As we already did throughout this 
article, we suppose that (X t : t G Z) is a weakly stationary zero mean time series 
such that Q holds. Then, the natural estimator for Y mt is 

L 

Ymt '= ^ (Xt~t, 4>m<>}i m = l,...,p and t = L + 1, . . . n - L, (28) 



where L is some integer and (j) m £ are retrieved from the estimated spectral density 
operator J 7 ^. Note that, by a slight abuse of notation, and (j) m t are based on 
fully observed and not on approximated data, as introduced in Section |3j We impose 
the following assumption. 

Assumption B.l The estimator is consistent in integrated mean square, i.e. 
we have 

pit 

E\Tf - PofsdO -»■ (n-K»). (29) 



Panaretos and Tavakoli [26j present an estimator which satisfies (29) under some 
functional cumulant conditions. Below we will establish an alternative sufficient con- 
dition, involving very mild technical conditions, such that Assumption B.l. holds. 



By stating (29) as an assumption, we intend to keep the theory more widely appli- 
cable. 

Since our method requires the estimation of eigenvectors of the spectral density 
operator, we need to introduce the following identifiability assumption. 
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Assumption B.2 Define a\{6) := Xi(9) — A 2 (#) and a m (9) := min{A m _i(6>) — 
X m (9),\ m (9) — A m+ i(#)} for m > 1, where Xi(9) is the z-th largest eigenvalue of 
the spectral density operator evaluated in 9. Then a m (8) has at most finitely many 
zeros. 



Theorem 1. Let Y mt be the random variable defined by (28). If L = L{n) — > oo 

sufficiently slowly, then, under Assumptions B.l and B.2, we have Y mt — > Y mt as 
n — )• oo. 

Remark 3. This result does not provide any guidelines how to choose the truncation 
level L. This is a common problem for infinite dimensional data and usually can 
only be overcome by imposing a number of additional technical assumptions, which 
cannot be verified in practice. We refer to Hormann and Kidzinski [Tbl/ for a similar 
problem in the context of functional regression. A solution would be to follow their 
approach, and develop a data-driven algorithm for the choice of L. This, however, 
goes far beyond the scope of this article. 

For the proof of Theorem [lj we show that E\Y mt — Y mt \ — > 0. Since 



E\Y m t — Y, m t\ < E 



nit I 



< E 



/A^i-.i-'-'m.i 

L 



-j,4>mj) — (-^*-j> 
j=-L 

/j (Xt-j, 4>mj - <Pmj) 



E 



\j\>L 



t-ji <Vmji 



(30) 



the result follows if each summand in (30) converges to zero. This will be proven in 
the two subsequent lemmas. 

Lemma 2. If L = L(n) — > oo sufficiently slowly, then, under Assumptions B.l and 
B.2, we have that 



(Xk-j,4>mj — 4>mj) 

\3\<L 



Op(l) (n — > oo). 

Proof. The triangle inequality and the Cauchy-Schwarz inequality yield 



\j\<L 



— Il^fe-J'll W&mj — <t>mj\\ 

j=-L 

L 

< max \\4> mj - 4> mj \\ ^ \\ X k- 

j=-L 



3& 



II • 
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Furthermore, Jensen's inequality and Lemma 3.2 in [17J imply that, for any j G Z, 



2vr||0 mi - 0. 



< 



< 



(tp m {0) - m (O))e A 

T 

\<p m (e) - <p m (e)\\d9 

lllf -Tf\\sA2<to. 



a, 



By Assumption B.2, a m (8) has only finitely many zeros, 9x,. . . ,8k, say. Let now 
5 e {9) = [9 — e, 9 + e] and A(m, e) = U^i^e(^)- By definition, the length of 
this set is \A(m,e)\ < 2Ke. Now define M e such that M~ l = mm{a m (9) | 9 e 
[—n,n]\A(m,e)}. By continuity of a m (9) (see Proposition [8]), we have M £ < oo, 
and thus 



, y^WVf - tf || A 2d9 < AKe + 8M 2 

— 7r l«m(^)| 2 



r fl — T \\d0 =: B, n 



By Assumption B.l, there exists a sequence e n — > such that B n £n — > in prob- 
ability. Thus, if we choose L(n) such that L = L(n) — > oo and LB neri = op(l), 
then 



^ ( x k-j,> 

\j\<L 



'mi (fimj) 



< LB n 



L 1 E W Xk -i\ 

j=-L 



(31) 



It remains to show that L 1 ll^fe-j II — Op(l). By the imposed weak stationarity 

i /■ 

we have i?||Xfc|| 2 = i^UXxH 2 , and hence for any R > 



pfL- 1 £; iia^ii >i?) < 



Li? 



< 



R 



□ 



Lemma 3. Lei L = L(n) — >■ oo. Then, under condition fl4J), we /iai>e 



E (Ajfc_j, m j) 



op(l) (n — > oo) 



Proof. The triangle and the Cauchy-Schwarz inequalities, and elementary algebraic 
transformations give 



E 



E (Afc-j, 0rry) 

lil>i 



-EE ll^-ilUll^fcmllll^mll 

|fc|>L |/|>L 

^-ilirll^fcmllll^ll^lAr^^Kd/l >L}. 



EEi^ 
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Now, setting h — k — I, we get 



E 



(X k -j,4>mj) 

\j\>L 

< SZ)ll^lUll^m||||0(^)m||/{|A:| >L}I{\h-k\ >L} 



- E H^IU l^ fcm ll H0(fc-ft)m||-f{|A;| > L} 



hez 



< 



1/2 



'fern 



The proof follows now from condition Q and 



1. 



□ 



We have stated the consistency result under the assumption of weak stationarity 
and Assumptions B.l and B.2. The following proposition shows that Assumption B.l 
holds under L 4 -m-approximability. We use the estimator 



X 



\h\<q 



\h\ 

Q 



C%e- ihe , 0<q<n. 



Proposition 11. Let (X t : t £ Z) be an L A -m-approximable series and let q = 
q(n) oo such that q 3 = o(n). Then Assumption B.l holds. 

For the proof, we need the following lemma, which is extending a consistency result 
from [T7] for the empirical covariance operator to lag h autocovariance operators. 
We define, for \h\ < n, 



C 



^ n—h 
h = — / Xk+h 



X k , h>0, and C h = C- h , h < 0. 



fe=i 



Lemma 4. Assume that (X t : t £ Z) is an L 4 -m-approximabable series. Then for 

all \h\ < n we have E\\Ch — Ch\\s — U ^j^, where the constant U does neither 
depend on n nor on h. 

(r) 

Proof. Let us only consider the case h > 0. Define X n as the r-dependent ap- 
proximation of (X n ) provided by Definition [3j We observe that 



nE C h -C h 



nE 



n—h 

n / — ' 



fc=i 



where Z k = X k+h eg) X k — Ch- Set = X^! h ® X k '' — Ch- Using the stationarity 



(r) 



-(r) 
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of the sequence (Zk) we obtain 



nE 



k=l s |rl<n-A V / 



S \r\<n—h 

h oo 

< £ |£<Z ,Z r ) 5 | + 2 £ |S(Z ,Z r ) 5 | 



(32) 



r=—h 



r=h+l 



and using the Cauchy-Schwarz inequality gives 



|£<Z Q) Z r > 5 | < E\(Z ,Z r ) s \ < y/E\\Z \\lE\\Z r \\% = E\\Z \\ 2 S . 
Furthermore, from \\Xh ® Xq\\ = \\Xh\\\\Xo\\, we deduce 

E\\Z \\% = E\\X \\ 2 \\X h \\ 2 < (£||X || 4 ) 1/2 < oo. 



Consequently, we can bound the first sum in ((321) by (2h + 1) (E||X || 4 ) 1/2 . For the 
summands of the second term in (32) we obtain by independence of zl r ~ h ^ and Z G 
that 

\E(Z ,Z r ) s \ = \E(Z ,Z r -Z^) s \ < (E\\Z \\%)y\E\\Z r - Zt h) \\%f' 2 . 

oo 

To conclude, it suffices to show that ^2(E\\Z r - Z^'Wl) 1 ' 2 < M < oo, where 

r=l 

the bound M is independent of h. Using an inequality of the type \ab — cd\ 2 < 
2|a| 2 |6 — d\ 2 + 2 1 | 2 1 a. — c| 2 , we obtain 

E\\Z r - Zt^Wl = E\\X r ® X r+h - Xf-V ® X^|| 2 



<2(E\\XXyi 2 (E\\X r+h -X^\\ 4 fl 2 
+ 2(E\\xl r - h h) \\ 4 ) 1 / 2 (E\\X r - X^f) 1 ' 2 . 

Note that £||X r || 4 = E\\X^~ h h) \\ 4 = E\\X \\ 4 and that E\\X r+h -X^ h h) \\ 4 = E\\X r - 



X 



(r-fc) M 4 



E\\X - X^~ h) \\ A . Altogether we get 



E\\Z r - Z^\\ 2 S < 4(E\\X \\Y 2 (E\\X -Xt h) \\ 4 ) 1/2 - 



Hence, L 4 -m-approximability implies that \E(Zq, Z r )$\ converges and is uni- 

r=h+l 

formly bounded over < h < n. □ 
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Proof of Proposition 1 1 By the triangle inequality, 



< 



t(i-M) (Cl -<j 4 



-i/i0 



h= — q 



h= — q 



1 9 



h=—q 



< 



h= — q 



+ 










1 9 

h= — q 



Is- 



\h\>q 



The last two terms tend to by condition Q and Kronecker's lemma. For the first 
term we may use Lemma |4} By taking the expectation, we obtain for some U\ that 



£ 1 

h=-<3 v 



9 



3/2 



Note that the bound does not depend on 9, hence g 3 = o(n) and the condition Q 
imply that sup de , n ^ E\\J-^ — JF*\\s — > as n — > oo. □ 
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